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DETAILED ACTION 

Specification 

1 . The title of the invention is not descriptive. A new title is required that is dearly 
indicative of the invention to which the claims are directed. 

Claim Objections 

2. Claim 10 is objected to because of the following informalities: 
In claim 10, line 2, "the probability data" has antecedent issues. 
Appropriate correction is required 

Claim Rejections - 35 USC § 102 

3. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

4. Claims 1-9, 11-13, 15, 16, and 20 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Pon et al. (Pon, US 6,047,251). 

As per claims 1 and 2, Pon teaches a system for automatically determining a 
language of a document from a set of candidate of languages, the system comprising: 

logic (C.7.lines 33-35-his subroutine) for setting a negative assumption value 
(CJ.Iines 36, 37-his setting of an initial "confidence statistic") indicating the document is 
not one of the candidate languages (ibid, CJ.Iines 1, 2, CJ.Iines 35-37,-interpreted that 
at a "zero" confidence level, the document is deemed not one of the candidate 
languages); 
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an extractor for extracting a character string from the document (C.7.lines 38-40); 

and 

a language analyzer (Fig. 4 item 106, Fig 5) for determining a probability value 
that the character string does not belong to the candidate languages (C.6.line 65- 
C.7.line 22-the "statistic that indicates whether a selected word is in a chosen 
language", wherein the "probability that a character string belongs to each of the 
candidate languages result inherently determines the value that a character string does 
not belong, daim 2) and includes logic for adjusting the negative assumption value 
based on the probability value (C.7. lines 39-41 -his "updating"), the language analyzer 
determining that the document is one language of the candidate languages when the 
negative assumption value passes a threshold value (C.8.lines 1-4, his "region" as the 
document, his current subzone for the region "is likely to be the language of the region, 
C.8. lines 5-25-use of the threshold, C.9.lines 10-12-entire document). 

As per claim 3, Pon teaches claim 2, and further teaches further including logic 
for retrieving the probability value from the probability data that corresponds to the 
character string (Fig. 5 item 130 and return logic). 

As per claim 4, Pon teaches claim 1 , and further teaches further including an 
information retrieval engine for retrieving documents in response to a search request, 
the documents retrieved being analyzed by the language analyzer (C.4. lines 29-32- 
downloading inherent to a search request, Fig. 2 item 52-identify language). 

As per claim 5, Pon teaches claim 1 , and further teaches wherein the logic for 
adjusting includes logic for combining the negative assumption value (C.7.lines 37, 38- 
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the "initial value") with the probability value (C.7. lines 39-45-his "region confidence 
factor" as a statistical value is added to the "initial" value). 

As per claim 6, Pon teaches claim 1 , and further teaches wherein the language 
analyzer further includes iteration logic for causing the extractor to extract another 
character string from the document if the negative assumption value fails to pass the 
threshold value (Fig 7- from the "select region r" item 158 to the "more regions" the loop 
is interpreted as iteration logic, C.8.lines 12-25-the appending of another subzone, 
which includes another character string from the document appended, CJ.Iines 50-55- 
"regions" and "zones" and words therein). 

As per claims 7, 8 and 13, Pon teaches a method of determining a language of 
a document from a set of candidate languages, the method comprising the steps of: 

setting a null hypothesis to a true value for each candidate language indicating 
the document is not in the candidate language and setting a false value (CJ.Iines 36, 
37-his setting of an initial "confidence statistic", C. 7. lines 1 , 2-his 1 , as true, and 0, as 
false, value, claim 13); 

extracting a text string from the document (C.7. lines 38-40); 

determining a contrary probability for each candidate language that the text string 
does not belong to the candidate language (C.6.line 65-C7.line 22-the "statistic that 
indicates whether a selected word is in a chosen language", wherein the "probability 
that a character string belongs to each of the candidate languages result inherently 
determines the value that a character string does not belong); 



Application/Control Number: 09/884,403 Page 5 

Art Unit: 2654 

adjusting the null hypothesis for each candidate language with the contrary 
probability corresponding to the candidate language (CJ.Iines 39-45-his "updating"- 
"value stored in the accumulator"-as the null hypothesis); and 

determining the document is one language from the candidate languages when 
the null hypothesis for the one language is disproved by approaching the false value 
(CJ.Iines 40-45-disproval interpreted the accumulation away from the true value above, 
C.8. lines 1-4, his "region" as the document, his current subzone for the region "is likely 
to be the language of the region, C.8.lines 5-25-use of the threshold, C.9.lines 10-12- 
entire document, wherein the accumulation). 

As per claim 9, Pon teaches claim 8 and further teaches repeating the extracting 
step for a different text string from the document and repeating the method until the null 
hypothesis is disproved for one of the candidate languages by passing the threshold 
value ((Fig 7- from the "select region r" item 158 to the "more regions" the loop is 
interpreted as iteration logic, C.8.lines 12-25-the appending of another subzone, which 
includes another character string from the document appended, C.7.lines 50-55- 
"regions" and "zones" and words therein). 

As per claim 11, Pon teaches claim 7 and further teaches identifying the 
document based on a search request (CAIines 29-32-downloading inherent to a search 
request, Fig. 2-subsequent processing). 

As per claim 12, Pon teaches claim 7 and further teaches extracting a plurality of 
sequential characters that form the text string (CAIines 64-67, CJ.Iines 6-7-Examiner 
interprets word to comprise sequential characters). 
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As per claims 15 and 16, claims 15 and 16 set forth limitations similar to claims 
1 and 7, and therefore are rejected for the same reasons and under the same rationale. 

As per claim 20, claim 20 sets forth limitations similar to claims 4 and 1 1 , and 
therefore is rejected for the same reasons and under the same rationale. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

6. Claims 10, 14 and 17-19 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Pon in view of Elworthy (US 6,125,362). 

As per claim 10, Pon teaches claim 7 but lacks further teaching pregenerating 
probability data corresponding to each candidate language, the probability data 
including a probability value for a text string that is normalized based on an occurrence 
probability of the text string in all the candidate languages. 

However, Elworthy teaches pregenerating probability data corresponding to each 
candidate language (C.2.lines 30-35-his "classification" as the candidate language), the 
probability data including a probability value for a text string that is normalized based on 
an occurrence probability of the text string in all the candidate languages (ibid, his 
"determined probability that an element or group of elements belongs to a classification" 
is interpreted as occurrence probability, C.2.lines 30-38, the comparison with probability 
values interpreted as the normalization). Therefore, at the time of the invention, it would 
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have been obvious to modify Pon with Elworthy by using pregenerating data as a 
probability step. The motivation for doing so would have been to develop a increasing 
accurate method in classifying data (C.2.lines 16-20). 

Claim 17 sets forth limitations similar to claim 10, and is thus rejected for the 
same reasons, and under the same rationale, wherein Elworthy further teaches contrary 
probability of a character string in one language is determined based on an occurrence 
frequency of the character string in the one language influenced by a total occurrence 
frequency of the character string in all the candidate languages (C.8.lines 27-31 -his 
"tokens" as character strings, Fig. 14a, b, c, C.13.lines 43-58-wherin the "probability" 
values inherently contain contrary probability values). 

As per claim 18, Pon and Elworthy make obvious claim 17, Elworthy further 
teaches determining the occurrence frequency of each character string based on a 
sample set of documents provided for each of the candidate languages (C.7.line 65- 
C.8.line7). 

As per claim 19, Pon and Elworthy make obvious claim 17, Elworthy further 
teaches wherein the contrary probability of the character string in one language is 
normalized by the total occurrence frequency of the character string in all the candidate 
languages (C.8.lines 27-31, ClO.line 15-C.1 1 .line 37, especially dO.lines 50-57-his 
"frequency of all word tokens in M, and p(m) as the normalization). 

As per claim 14, Claim 14 sets forth limitations similar to claims 17, 18, and 19, 
and is thus rejected for the same reasons, and under the same rationale. 
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Conclusion 

7. The prior art made of record and not relied upon is considered pertinent to 

applicant's disclosure. 

van den Akker (US 6,41 5,250) teaches automatically identifying a 
language using predetermined portions of words and probabilities 
methods. 

Martino et al. (US 6,002,998) teaches determining languages from 
text by probabilistic word tables, and word comparison. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Lamont M. Spooner whose telephone number is 
571/272-7613. The examiner can normally be reached on 8:00 AM - 5:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on 571/272-7602. The fax phone number 
for the organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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